Enjoy The Paper: Lexical Semantics Via Lexicology
نویسندگان
چکیده
Current research being undertaken at both Cambridge and IBM is aimed at the construction of substantial lexicons containing lexical semantic information capable of use in automated natural language processing (NLP) applications. This work extends previous research on the semi-automatic extraction of lexical information from machine-readable versions of conventional dictionaries (MRDs) (see e.g. the papers and references in Boguraev & Briseoe, 1989; Walker et al., 1988). The motivation for this and previous research using MRDs is that entirely marina1 development of lexicons for practical NLP applications ks infeasible, given the labour-intensive nature of lexicography (e.g. Atkins, 1988) and the resources likely to he allocated to NLP in the foreseeable future. In tiffs paper, we motivate a particular approach to lexicai semantics, briefly demonstrate its computational tractability, and explore the possibility of extracting the lexical information this approach requires from MRDs and, to some extent, textual corpora. 1. Lexlcal Semantics A theory of lexical semantics should provide an efficient representation of lexical semantic information in the paradigmatic plane which is capable of integrating with a genuinely compositional semantic account in the syntagmatic plane. Our starting point for this research is the work of Levin (e.g. 1985) and others on verbal alternations (diathesis), Pustejovsky (e.g. 1989) on lexical coercion and qualia theory, and Evans & Gazdar (e.g. 1989) on default inheritance within unification-based formalisms. It can be seen as a further contribution to the use of unification-based formahsms in linguistic description and specifically as an enriching of the minimal sort-based lexical semantic taxonomy incorporated into the Esprit ACORD system (Moens et al., 1989) and the SRI (Cambridge) CLE system (Alshawi et al., 1989). We propose a system in which a standard graph-based unification formalism, such as PATR-II, is augmented with minimal disjunction (of atomic terms) and minimal default inheritance (allowing only 'orthogonal' multiple inheritance in a manner similar to Evans & Gazdar's DATR). Using such a system we are able to see the beginnings of solutions to three problems concerning the integration of lexical semantics with a general theory of linguistic description and processing alternations, coercion, and decomposition / representation. The first problem emerges with systems, such as the Alvey Tools grammar (Carroll & Grover, 1989), which attempt to characterise the grammatical behaviour of lexical items in terms of sets of subcategorisation frames. Intuitively, this often seems arbitrary and inelegant because the occurrence of alternation seems to be semantically motivated. This problem has been discussed in connection with w~rbs mostly, but also arises with nouns and adjectiw'~a. For instance, in the Tools lexicon the verb believe has eight entries. Six of these separate entries relate to the same or a very similar sense of believe; namely, believe3 (Longman Dictionary of Contemporary English, LDOCE) 'to hold as an opinion; suppose' which is a relatlon between an individual (the believer) and a proposition (what is believed). Treating the various grammatical realisations of this sense of believe separately predicts that it is pure accident that they share the same sense. It also suggests that the range of possible alternations is unpredictable and must simply be listed from verb to verb. Most of the work on alternations has concentrated on attempts to characterise semantic classes of verbs which undergo similar alternations (e.g. Levin, 1985). This enterprise has not been particularly successful (Boguraev & Briscoe, 1989b), but in any case ignores or simply assumes the prior point that it is possible to construct a system in which there is just one entry for believe3. Nevertheless, it seems correct that examples like John believed that Mary was clever / Mary (to be) clever / Mary / the rumour should be related to one entry for believe because this would allow us to account for the interpretation of John believed Mary as something like 'John believed something(s) that Mary asserted'; thai: is, as standing for some 'understood' proposition involving Mary. Pustejovsky (1989b) refers to this process as coercion and compares it to examples such as John considers Mary a genius where it is usual (e.g. in GPSG, Gazdar et al., 1985) to claim that a genius functions predicatively because the subcategorisation frame for consider forces this interpretation. In general, coercion is a problem in theories which take the syntactic aspect of grammatical realisation as primary, but would be a natural consequence of a theory which took the sense and rite fact that believe3 is a relation between an individual and a proposition as basic. In such an account an NP complement of a verb denoting a relation between an individual and a proposition would either denote a proposition 'directly' (the rumour) or be coerced to the appropriate semantic type (Mary). When coercion occurs some additional information is required to 'flesh out' the elevated semantic type of the complement. Pustejovsky (1989) dubs this logical metonymy. In the case of believed Mary this is that it is some assertion of Mary's which is believed. This information appears to be inherited from the verb. In other cases, such as John enjoyed (watching) the film, John began (reading) the book, or John finished (drinking) the beer, it is more plausible that the missing information is provided by the lexical specification of the ; NP complements (cf: John enjoyed (drinking) the beer, John finished (reading) the book). Pustejovsky (1989, 1989b) and Pustejovsky & Anick (1988) propose that the lexical representation of nouns is enriched to include a specification of processes typically associated with the objects they denote and that, in cases of coercion, this information is utilised. In their terms, this is the tdic role of the qualia structure of the noun. We see the inheritance of this information from the verb or complement as a default process which operates in the absence of more marked pragmatic information. For example, one would normally enjoy (watching) the play, but it would not be difficult to construct a discourse context in which someone (say lecturer or student) enjoyed
منابع مشابه
Exploring Linguistic Creativity via Predictive Lexicology
Creativity is not just a matter of generation, but of interpretation, since for creativity to be recognized, it must be interpreted (and not rejected) by other agents. In the domain of lexical creativity, which concerns the generation of innovative word forms, we can describe two kinds of creative process: explanatory lexicology, in which the lexical creativity of others is appreciated and unde...
متن کاملOn Ways Words Work Together -topics in Lexical Combinatorics
The domain of lexical combinatorics has received much interest over the last years, in syntax, lexical semantics and lexicology, but also in lexicography, terminology, terminography and in Natural Language Processing (NLP). If the field of combinatorics can maybe trivially be defined by the fact that it deals with syntagmatic combination phenomena involving two or more lexemes, it is much harde...
متن کاملDRAFT Automatic Creation of Lexical Knowledge Bases : New Developments in Computational
Text processing technologies require increasing amounts of information about words and phrases to cope with the massive amounts of textual material available today. Information retrieval search engines provide greater and greater coverage, but do not provide a capability for identifying the specific content that is sought. Greater reliance is placed on natural language processing (NLP) technolo...
متن کاملLexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs
The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...
متن کاملCan we determine the semantics of collocations without using semantics?
The extraction of collocations from corpora has been actively worked on since the late eighties. However, so far, an important task of collocation processing, namely the semantic interpretation of the collocate, did not receive much attention, although the semantics of a given word when used as collocate very often varies from the semantics of this word when used in a free co-occurrence. In thi...
متن کاملMemory-Based Lexical Acquisition and Processing
Current approaches to computational lexicology in language technology are knowledge-based (competence-oriented) and try to abstract away from specific formalisms, domains, and applications. This results in severe complexity, acquisition and reusability bottlenecks. As an alternative, we propose a particular performance-oriented approach to Natural Language Processing based on automatic memory-b...
متن کامل